Goto

Collaborating Authors

 dynamic program




On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Neural Information Processing Systems

Risk-averse reinforcement learning (RL) seeks to provide a risk-averse policy for high-stakes real-world decision problems. These high-stake domains include autonomous driving (Jin et al., 2019; Sharma et al., 2020), robot collision avoidance (Ahmadi et al., 2021; Hakobyan and Y ang, 2021),



Accelerating ERM for data-driven algorithm design using output-sensitive techniques

Neural Information Processing Systems

Data-driven algorithm design is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of computationally efficient data-driven algorithms for combinatorial algorithm families with multiple parameters.



On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Neural Information Processing Systems

Risk-averse reinforcement learning (RL) seeks to provide a risk-averse policy for high-stakes real-world decision problems. These high-stake domains include autonomous driving (Jin et al., 2019; Sharma et al., 2020), robot collision avoidance (Ahmadi et al., 2021; Hakobyan and Y ang, 2021),



A Dynamic Programs For SSK Evaluations and Gradients We now detail recursive calculation strategies for calculating k n (a, b) and its gradients with O (nl

Neural Information Processing Systems

A recursive strategy is able to efficiently calculate the contributions of particular substring, pre-calculating contributions of the smaller sub-strings contained within the target string. Context-free grammars (CFG) are 4-tuples G = ( V, Σ,R,S), consisting of: a set of non-terminal symbols V, a set of terminal symbols Σ (also known as an alphabet), a set of production rules R, a non-terminal starting symbol S from which all strings are generated. The CFG for the symbolic regression task of Section 5.3 is given by the following rules: S S '+' T S S ' ' T S S '/' T S T T '(' S ')' T ' sin (' S ')' T'exp (' S ')' T'x' T '1' T '2' T '3', We now provide implementation details for our GA acquisition function optimizers. The GA begins with a randomly sampled population and ends once the best string in the population stops improving between iterations (Algorithm 1). Although seemingly simple tasks, our synthetic string optimization tasks of Section 5.1 are deceptively We now provide comprehensive experimental results across the synthetic string optimization tasks.


Symmetric Policy Design for Multi-Agent Dispatch Coordination in Supply Chains

arXiv.org Artificial Intelligence

We study a decentralized dispatch coordination problem in a multi-agent supply chain setting with shared logistics capacity. We propose symmetric (identical) dispatch strategies for all agents, enabling efficient coordination without centralized control. Using a common information approach, we derive a dynamic programming solution that computes optimal symmetric dispatch strategies by transforming the multi-agent problem into a tractable dynamic program on the agents common information state. Simulation results demonstrate that our method significantly reduces coordination cost compared to baseline heuristics, including belief-based strategies and an always-dispatch policy. These findings highlight the benefits of combining symmetric strategy design with a common information-based dynamic programming framework for improving multi-agent coordination performance.